There Is No Preview Available For This Item

This item does not appear to have any files that can be experienced on Archive.org.
Please download files in this item to interact with them on your computer.
Show all files

Stack Overflow Documentation Data Dump

by: Stack Exchange, Inc.

Publication date: 2017-09-08

Usage: Attribution-Share Alike 3.0

Topics: stackoverflow documentation

Collection: opensource

Language: English

Stack Overflow Documentation Data Dump

Addeddate: 2017-09-08 17:40:39

Identifier: documentation-dump.7z

Identifier-ark: ark:/13960/t0jt5vn75

Scanner: Internet Archive HTML5 Uploader 1.6.3

Year: 2017

plus-circle Add Review

comment
Reviews

Reviewer: Wikiod - - April 15, 2021
Subject: Created a wiki website

I have created a wiki website from this data, I would like to improve and give something better https://www.wikiod.com/w/Main_Page

Reviewer: Iam53 - - September 12, 2020
Subject: Relocation

To get latest working copy or update the contents go to https://www.programming-books.io/index-grid

Reviewer: RIP Tutorial - - February 21, 2018
Subject: Website created from the SO Data Dump

A website from this dump has been created: http://www.riptutorial.com/

It’s current read-only, but I plan to make a wiki from it to allow people to edit example and make embedded live examples using some fiddle such as .NET Fiddle, SQL Fiddle, JS Fiddle which I believe was highly missing to be an “example first” documentation.

Reviewer: Tony Dallimore - - October 24, 2017
Subject: Easy to use but unusable dates

Summary

I found it relatively easy to extract the data I wanted from the archive. My only serious criticism is the obscure (to non-Unix users) date format. I do not need dates but, if they are thought to be important, I believe the archive should be recreated with standard JSON dates

Detail

I had started writing an introduction to Outlook VBA within Stack Overflow Documentation, which I did not wish to lose. I have started extracting my text from the website but had not finished when the documentation was taken down so if I was to find my text I would have to extract it from the archive.

File “documentation-dump.7z” was easy to download. WinZip extracted its contents just as easily. No doubt, your favourite extraction utility will work just as effectively.

The file “readme.txt” seemed the obvious start point. This file lists the other files each with what look like a list of field names. There is no other documentation that I have found. I have decoded the files of interest to me without difficulty so perhaps no more documentation was necessary. On the other hand, I have had much practice at decoding undocumented files so others may find this lack of documentation more troubling.

Most of the files had an extension of “json” which meant nothing to me. A search for “json” found http://json.org/ which provided an adequate definition of the format of JavaScript Object Notation which is a lightweight data-interchange format. Before the days of XML I was a specialist in electronic data interchange so again I may not be the best judge of the adequacy of this definition.

Starting with the first file, “contributors.json”, I found:

[
{
"Id": 1,
"DocTopicId": 1,
"UserId": 80572,
"DocContributorTypeId": 2,
"CreationDate": "\/Date(1446697142040-0500)\/"
},
{
"Id": 2,
and so on

With my newly acquired knowledge of this format, I knew “{name/value pair, name/value pair, …}” was an object and “[” was the start of an array so the file was an array of these simple objects. The names and values looked obvious enough and with the one exception of "\/Date(1446697142040-0500)\/".

When searching for “json”, “json date format” is the first suggestion. Apparently, "\/Date(1446697142040)\/" is milliseconds since 0:00 on 1 January 1970. I can find nothing to explain “-0500” which seems to be a private SO addition; I assume it is something to do with a time zone. Apparently, JSON does not define date formats and conventions have changed over the years. However, a Stack Overflow answer with a score of 1140 recommends ISO 8601’s format: “2012-04-23T18:25:43.511Z”. Apparently this format is endorsed by everyone that matters. The only reason for considering milliseconds since 1970 seems to be that it is a standard within Unix and even the oldest libraries have routines that can read it. This is not a standard that appeals to non-Unix users. I do not know if dates are important to anyone; certainly they are not important to me. If they are thought to be important, I believe the archive should be recreated with dates in ISO 8601 format so everyone can use them.

I searched “contributors.json” for my user id and found enough occurrences to match the number of examples I had written.

I tend to use Excel and VBA for this type of investigation. VBA is an adequate language and Excel worksheets are a convenient repository for poorly understood data. I wrote code to extract each object containing my user id and save it as a row within an Excel worksheet. The first few rows and columns of that worksheet contain:

Row|  A  |     B    |      C     |   D  |          E         |             F            |

   |-----+----------+------------+------+--------------------+--------------------------|

  1|   Id|DocTopicId|DocExampleId|UserId|DocContributorTypeId|CreationDate              |

   |-----+----------+------------+------+--------------------+--------------------------|

  2|79143|          |       26136|973283|                   2|/Date(1484774756887-0500)/|

   |-----+----------+------------+------+--------------------+--------------------------|

  3|79144|      8111|            |973283|                   2|/Date(1484774756887-0500)/|

   |-----+----------+------------+------+--------------------+--------------------------|

  4|79145|          |       27558|973283|                   2|/Date(1484774756887-0500)/|

   |-----+----------+------------+------+--------------------+--------------------------|

  5|79350|          |       27628|973283|                   2|/Date(1485051525857-0500)/|

   |-----+----------+------------+------+--------------------+--------------------------|

Sorry if the above if difficult to read. SO's pre-formatting of text does not work here.

The column “DocExampleId” looked interesting so I looked at the file “examples.json” which starts:

[
{
"Id": 1,
"DocTopicId": 1,
"Title": "Basic Usage",
"CreationDate": "\/Date(1446697142040-0500)\/",
"LastEditDate": "\/Date(1469351669667-0400)\/",
"Score": 6,
"ContributorCount": 2,
"BodyHtml": "

using StackExchange.Redis;\r\n\r\n// ...\r\n\r\n// connect to the server\r\nConnectionMultiplexer connection = ConnectionMultiplexer.Connect("localhost");\r\n\r\n// select a database (by default, DB = 0)\r\nIDatabase db = connection.GetDatabase();\r\n\r\n// run a command, in this case a GET\r\nRedisValue myVal = db.StringGet("mykey");\r\n

\r\n\r\n",
"BodyMarkdown": " using StackExchange.Redis;\n\n // ...\n\n // connect to the server\n ConnectionMultiplexer connection = ConnectionMultiplexer.Connect(\"localhost\");\n \n // select a database (by default, DB = 0)\n IDatabase db = connection.GetDatabase();\n\n // run a command, in this case a GET\n RedisValue myVal = db.StringGet(\"mykey\");",
"IsPinned": false
},
{
"Id": 2,
"DocTopicId": 2,

Again an array of objects with “BodyHtml” perhaps the text I sought. I searched for one of the DocExampleIds against my UserId and found the html I sought.

I wrote code to extract each object containing a DocExampleId for which I was listed as a contributor which I saved as a row within another Excel worksheet. This worksheet contained all the text I wanted together with links to my images.

I had problems reading “examples.json” which is 92Mb and uses UTF-8 encoding. I suspect my problems are unique to VBA. If you are interested in these problems, see: https://stackoverflow.com/q/46838258/973283.

Conclusion

The summary section is my review of the SO documentation archive. The detail section is intended to justify my review and to help and encourage anyone who wishes to extract information from this archive but is intimidated by the format.

17,366 Views

22 Favorites

4 Reviews

DOWNLOAD OPTIONS

1 file

TORRENT

6 Files
6 Original

SHOW ALL

IN COLLECTIONS

Community Texts

Community Collections

Uploaded by Stack Exchange on September 8, 2017

Internet Archive Audio

Featured

Top

Images

Featured

Top

Software

Featured

Top

Books

Featured

Top

Video

Featured

Top

Mobile Apps

Browser Extensions

Archive-It Subscription

Save Page Now

Stack Overflow Documentation Data Dump

plus-circle Add Review

comment
Reviews

DOWNLOAD OPTIONS

IN COLLECTIONS

SIMILAR ITEMS (based on metadata)

Internet Archive Audio

Featured

Top

Images

Featured

Top

Software

Featured

Top

Books

Featured

Top

Video

Featured

Top

Mobile Apps

Browser Extensions

Archive-It Subscription

Save Page Now

Stack Overflow Documentation Data Dump

Item Preview

Flag this item for

Stack Overflow Documentation Data Dump

plus-circle Add Review comment Reviews

DOWNLOAD OPTIONS

IN COLLECTIONS

SIMILAR ITEMS (based on metadata)

plus-circle Add Review

comment
Reviews